google 字符集BigQuery 如何匹配字符

spark SQL跟aws redshift,google bigQuery对比如何?自己用能否替代 - 知乎20被浏览843分享邀请回答01 条评论分享收藏感谢收起Access denied | www.bimeanalytics.com used Cloudflare to restrict access
Please enable cookies.
What happened?
The owner of this website (www.bimeanalytics.com) has banned your access based on your browser's signature (41e9b88f3ba3519a-ua98).How to replace SAP BW with Google BigQuery?
| LinkedIn
“90% of the data ever created was created in the last 2 years" - unknown&
Why change?
“The data is getting bigger and we just can’t cope. &No matter how many more blades we add to the server the queries still run slowly. &We've stripped out all the unrequired fields, only uploaded two years worth of data, optimised the cubes so only the most relevant dimensions are available and finally tuned the queries so they would only pull back the most frequently asked requests. &We can barely get this data set running and the management now want it merged with external sales data to measure the impact of their latest campaign. &Do they realise how difficult that is? &Taking it offline into MS Excel/Access might workaround the problem for now.”
Granted, maybe this is a bit hypothetical (or maybe it isn't). &One thing is for certain, the way we used to do data warehousing and business intelligence in the past may not continue to work for us in the future. &Disruptive technologies such as Digital, Social and the Internet of Things (IoT) mean that the sheer volume and variety of data is becoming overwhelming. &&
How do you fish for business insights in an ocean of data? &How can you deal with the ever growing volume of internal data being created by your core systems such as SAP? How do you pool this data and fish it with the rest of the ocean to create new business insights that could never be obtained before?
This post will not solve these questions but it may give you a glimmer of hope. &It &is based on a short feasibility project to understand if it is possible to replace SAP’s business warehouse with a big-data solution, namely Google . &
Best to get my caveat in upfront. &I'm not in any way attempting to marginalise the skills, intelligence and sheer tenacity required to build, tune and run a data warehouse (I have many SAP BW, SAS and Cognos scars to show from my previous endeavours). &I'm merely attempting in this post to challenge the status-quo and maybe to do some thinking outside (or even burn) the box. &
This pilot would not have been possible without specialist knowledge and expertise from
on BigQuery and report writing and
for data extraction from SAP. &I would highly recommend working with these guys.
Apache Hadoop versus Google BigQuery?
OK now for the very short history &of Big Data&(apologies for any inaccuracies). &Once upon a time two opposing technologies
and Google
traced their family tree and found that they were actually related. &They both had the same daddy - Google. &
Hadoop was originally based on whitepapers that Google released in December 2004, introducing MapReduce and the Google File System and running on commodity hardware. &Google had reportedly disclosed a 'diluted' version of their established technology which they had been using for many years.
In 2006, while working at Yahoo, Doug Cutting and Michael&Cafarella&created Hadoop. The Apache foundation adopted this open-source technology and released it on the world. &Hadoop took some time to get traction but has now developed into a really strong product with a massive ecosystem of tools to support it.
Virtually all of the commercial ‘Big Data’ platforms offered by leading vendors ( for yourself) are based on&open-source Hadoop. &These vendors provide the cloud storage, infrastructure, management and processing power required to productise Hadoop.
BigQuery on the other hand may be a more legitimate ‘Big Data’ heir. &You don’t have to purchase any additional licences or configure any infrastructure to use it. &You simply pay for what you use and Google manage the rest. &Google may have also kept their best technology for their own product.
The BigQuery stack
I've seen business intelligence (BI) diagrams with up to 10 layers, which I find a bit daunting and confusing. &For the purpose of this post I’ll simplify the stack to 3 layers:
The ETL (Extract, Transform, Load) layer [on-premise] - automatically gets the transactional/master data from the source (SAP ECC6 in our case), cleanses and organises it, then loads it into a database. &We needed a BigQuery compatible ETL tool to do this. &We chose
(open-source) although other tools such as
are also compatible.
The database layer [cloud]&- holds the data in logical tables so that it can be 'interrogated'. &BigQuery provided the muscle for this part.
The reporting layer [cloud]&- the results of the 'interrogation'. &Again a number of BigQuery
are available. &We chose
for the pilot. &For simple regular reports we were also able to connect
as a (zero cost) reporting layer. &Many reporting tools (for different purposes/mining/analysis techniques) could be &simultaneously connected to the same 'one version of the truth' BigQata database.
5 Challenges to overcome
The ETL process (as expected) was the most challenging aspect of the project. &It also consumed the most time (70%) and resource. &Included in this challenge was the BigQuery data schema design. &Do not underestimate this step. &Clever data architecture is required to make it work - all the previous BI skills I commended earlier come into play.
To reduce query time and cost, staging tables must be included in the BigQuery design. &Separate data tables were created for each year for reporting purposes. &BigQuery alloys for table unions, this feature can also be used to add additional data sets, such as 'forecasts'.&BigQuery also supports a syntax that allows other datasets (e.g. customer, consumer, social) to be easily joined to existing data without requiring pre-processing or reloading.
The SAP Java Connector (JCO) is critical to the ETL process. &This took approximately 5 minutes to upload 1 months data. &Version 3+ of the connector allows for loading large volumes of data. &If you have to use version 2.0 you will need to chunk the data into smaller loads.
BigQuery caching both decreases response time and also lowers processing costs. &It is very important to ensure that there is a high caching ratio (90%+ works) built into the design.
Incremental updates and the ability to restate history (retrospectively re-configure master-data hierarchies). &Incremental updates relied on the (indexed) time-stamp field of the date the record was created in SAP. &Master data tables were held separately and joined to the transactional data as reports were executed (BigQuery is capable of supporting joins). &This allowed for flexible master data changes.&
Success and reasons to continue
We successfully created an end-to-end automated reporting solution. & & SAP-Talend-BigQuery-Bime (& Sheets). &The BIME report results reconciled with the equivalent SAP BW report.
We didn't even test the power of BigQuery as a full years worth of data from SAP barely crossed the BigQuery free allowance of resources. &We didnt even touch the sides of what was possible. &We have no doubt that we could load all of our historical and ongoing data with no drop in performance.
Reports ran c.20 times faster than the equivalent reports in SAP BW (running on a multi-blade server) at less than 10 seconds each. &It was also much faster than this in the BigQuery database layer, the 10 seconds being mainly in the reporting front end (which I'm sure could be further tuned). &After the first time the reports were ran on fresh data they were instantaneous (caching kicked in). &
BIME could be used to provide Executive reports on any mobile device. &Imagine instant access to the pulse of the business from an iPhone.
The cost savings of doing BI in this way is significant when compared to traditional methods. &Cached properly the BigQuery costs are a small fraction of the running and infrastructure costs most of us are used to. &This also means that a step change in volume and variety of data sources if economically viable.
To replace SAP BW in it's entirety with BigQuery would be a 6 month project, ETL and BigQuery data architecture accounting for the lions share of time.
Conclusion
Our pilot was successful and proves that BigQuery is a viable option for enterprise BI. &The glimmer of hope that I mentioned earlier is more than just a glimmer. &Imagine joining all those disparate data pools (financial, customer, campaign, social, IoT) &into an enterprise data ocean and being to data mine this to provide rapid clear insights that were never possible before. &Imagine doing this with a clear cost benefit advantage.
This is the end of this post. &It is based completely on my own personal opinions. &&&In previous posts I have covered how to introduce
, Tips for
and how to &successfully manage an
.&&I hope to keep posting further related topics if I continue to get interest I have received so far - Thank you!
Looking for more of the latest headlines on LinkedIn?& Google BigQuery Public Datasets (&&)
Google BigQuery is not only a fantastic tool to analyze data, but it also has a repository of public data, including GDELT world events database, NYC Taxi rides, GitHub archive, Reddit top posts, and more.
Top Stories Past 30 Days
Most Popular
Most Shared
& Google BigQuery Public Datasets (&&)Login to unlock InfoQ's new features
Stay up to date and get notified
Like your favorite content
Follow your favorite editors and peers
1,201,360 Apr unique visitors
Featured in
Development
Featured in
Architecture & Design
Featured in
AI, ML & Data Engineering
Featured in
Culture & Methods
Featured in
Software Development Conference
You are here:
Google BigQuery Content on InfoQ
Presentations about&Google BigQuery
AI, ML & Data Engineering
697 Followers
Neville Li
0&Followers
May 26, 2017
Neville Li tells the Spotify’s story of migrating their big data infrastructure to Google Cloud, replacing Hive and Scalding with BigQuery and Scio, which helped them iterate faster.
News about&Google BigQuery
AI, ML & Data Engineering
697 Followers
Alex Giamas
8&Followers
Jan 05, 2017
238 Followers
Kent Weare
9&Followers
Feb 28, 2016
238 Followers
Kent Weare
9&Followers
Nov 22, 2015
Abel Avram
7&Followers
Jan 29, 2015
Alex Giamas
8&Followers
Sep 30, 2014
Michael Hausenblas
1&Followers
Feb 14, 2014
Roopesh Shenoy
0&Followers
Nov 21, 2012
Architecture & Design
Culture & Methods
AI, ML & Data Engineering
General Feedback
Advertising
InfoQ.com and all content copyright ©
C4Media Inc. InfoQ.com hosted at , the best ISP we've ever worked with.
Recover your password...
InfoQ Account Email
Follow your favorite topics and editors
Quick overview of most important highlights in the industry and on the site.
More signal, less noise
Build your own feed by choosing topics you want to read about and editors you want to hear from.
Stay up-to-date
Set up your notifications and don't miss out on content that matters to you
Note: If updating/changing your email, a validation request will be sent
Company name:
Keep current company name
Update Company name to:
Company role:
Keep current company role
Update company role to:
Company size:
Keep current company Size
Update company size to:
Country/Zone:
Keep current country/zone
Update country/zone to:
State/Province/Region:
Keep current state/province/region
Update state/province/region to:
Subscribe to our newsletter?
Subscribe to our architect newsletter?
Subscribe to our industry email notices?

我要回帖

更多关于 google浏览器字符集 的文章

 

随机推荐