Installation of Salesforce Libraries

!pip install simple_salesforce
!pip install salesforce_bulk

Importing Commom Libraries

import json
import datetime as dt
import pandas as pd
import numpy as np
import time as t

Importing SalesForce Libraries

from simple_salesforce import Salesforce,SalesforceLogin,SFType
from salesforce_bulk import SalesforceBulk
from salesforce_bulk.util import IteratorBytesIO
from urllib.parse import urlparse
from salesforce_bulk import CsvDictsAdapter

!pip show simple_salesforce

Name: simple-salesforce
Version: 1.11.1
Summary: A basic Salesforce.com REST API client.
Home-page: https://github.com/simple-salesforce/simple-salesforce
Author: Nick Catalano
Author-email: nickcatal@gmail.com
License: Apache 2.0
Location: c:\users\dell\anaconda3\lib\site-packages
Requires: requests, authlib
Required-by: salesforce-bulk

!pip show salesforce_bulk

Name: salesforce-bulk
Version: 2.2.0
Summary: Python interface to the Salesforce.com Bulk API.
Home-page: https://github.com/heroku/salesforce-bulk
Author: Scott Persinger
Author-email: scottp@heroku.com
License: MIT
Location: c:\users\dell\anaconda3\lib\site-packages
Requires: six, unicodecsv, requests, simple-salesforce
Required-by:

Importing SalesForce Credentials

import Salesforce_one
import Salesforce_two

Enter username, password and security_token (you are able to generate in using the salesforce org)

Path to generate salesforce token: My Settings->Reset My Security Token-> Click on Reset Security Token-> Token will come to your registered Email Id

Domain has two option:
1) Login : When you are using developer Org

2) Test : When you are using sandbox Org

username = Salesforce_one.username
password = Salesforce_one.password
security_token = Salesforce_one.security_token
domain = 'login'

Starting a session in the Salesforce Org

session_id, sf_instance = SalesforceLogin(username=username, password=password, security_token=security_token,domain=domain)
sf = Salesforce(instance=sf_instance,session_id=session_id)
print(username,sf)

Metadata of the Particular object and save the result in the excel or csv file

md= sf.Object_name
md_metadata = At.describe()
df_md_metadata = pd.DataFrame(At_metadata.get('fields'))
df_md_metadata.to_csv('md_metadata.csv',index=False)

These 3 are the most important column that are been used in the Salesforce Org

df_md_metadata[['label','name','type']]

These are the columns specified for the particular Object that are to be extracted from the salesforce Org

columns = """Id,ParentId,ActivityId,CreatedById,CreatedDate,LastModifiedDate,LastModifiedById,SystemModstamp,TextBody,HtmlBody,Headers,Subject,FromName,FromAddress,ValidatedFromAddress,ToAddress,CcAddress,BccAddress,Incoming,HasAttachment,Status,MessageDate,IsDeleted,ReplyToEmailMessageId,IsExternallyVisible,MessageIdentifier,ThreadIdentifier,IsClientManaged"""
columns

'Id,ParentId,ActivityId,CreatedById,CreatedDate,LastModifiedDate,LastModifiedById,SystemModstamp,TextBody,HtmlBody,Headers,Subject,FromName,FromAddress,ValidatedFromAddress,ToAddress,CcAddress,BccAddress,Incoming,HasAttachment,Status,MessageDate,IsDeleted,ReplyToEmailMessageId,IsExternallyVisible,MessageIdentifier,ThreadIdentifier,IsClientManaged'

querySQQL = "select {} from Object_name".format(columns)
print(querySQQL)

select Id,ParentId,ActivityId,CreatedById,CreatedDate,LastModifiedDate,LastModifiedById,SystemModstamp,TextBody,HtmlBody,Headers,Subject,FromName,FromAddress,ValidatedFromAddress,ToAddress,CcAddress,BccAddress,Incoming,HasAttachment,Status,MessageDate,IsDeleted,ReplyToEmailMessageId,IsExternallyVisible,MessageIdentifier,ThreadIdentifier,IsClientManaged from Emailmessage

Extract the data in one time and only 200 rows are available in one go.

response = sf.query(querySQQL)
lstRecords = response.get('records')
nextRecordsUrl = response.get('nextRecordsUrl')
print(response.get('totalSize'))
#This will print the total Size of the record that are present in the particular object

Using While Loop To extract all the rows that are present in the Object

while not response.get('done'):
    response = sf.query_more(nextRecordsUrl, identifier_is_url=True)
    lstRecords.extend(response.get('records'))
    nextRecordsUrl = response.get('nextRecordsUrl')

Converting the series of records in the python dataframe to manupilate and transform the data.

final_insert = pd.DataFrame(lstRecords).drop('attributes',axis=1)
final_insert.shape

(763771, 28)

Check number of unique id's and shape of the dataframe are equal and that will confirm that there are not duplicate rows in the dataframe

final_insert.Id.nunique()

Creating the new DataFrame and Copying all the columns to the new dataframe

Questions comes why we are doing that

Change the datatype of the column
Manuplate or Tranform the data
Take the data from one salesforce org to another org but the data is same but has different column name
It will be easily undertantable that what columns are we supposed to transfer to another org

df_em = pd.DataFrame()

df_em['Old_SF_ID__c'] = final_insert['Id']
df_em["BccAddress"]=final_insert["BccAddress"]
df_em["CcAddress"]=final_insert["CcAddress"]
df_em["ValidatedFromAddress__c"]=final_insert["ValidatedFromAddress"]
df_em["FromAddress"]=final_insert["FromAddress"]
df_em["FromName"]=final_insert["FromName"]
df_em["Headers"]=final_insert["Headers"]
df_em["HtmlBody"]=final_insert["HtmlBody"]
df_em["IsExternallyVisible"]=final_insert["IsExternallyVisible"]
df_em["Incoming"]=final_insert["Incoming"]
df_em["MessageDate"]=final_insert["MessageDate"]
df_em["Status"]=final_insert["Status"]
df_em["Subject"]=final_insert["Subject"]
df_em["TextBody"]=final_insert["TextBody"]
df_em["ToAddress"]=final_insert["ToAddress"]

This is the step to tranform or manupilate salesfore data by using for loop or if condition

Note:

In Salesforce Org we can not insert the null values. So we have to deal with the null values by filling with blank values.

df_em.fillna('',inplace = True)

df_em.shape

(472, 15)

Checking the null values

df_em.isnull().sum()

Old_MM_SF_ID__c            0
BccAddress                 0
CcAddress                  0
ValidatedFromAddress__c    0
FromAddress                0
FromName                   0
Headers                    0
HtmlBody                   0
IsExternallyVisible        0
Incoming                   0
MessageDate                0
Status                     0
Subject                    0
TextBody                   0
ToAddress                  0
dtype: int64

Login to another Salesforce Org where to transfer the data from Salesforce Org

username = Salesforce_two.username
password = Salesforce_two.password
security_token = Salesforce_two.security_token
domain = 'login'

session_id, sf_instance = SalesforceLogin(username=username, password=password, security_token=security_token,domain=domain)
sf = Salesforce(instance=sf_instance,session_id=session_id)
print(sf)

In this step we have to mention the object name that on which we have to performing the operation(insert,delete or update)

project = SFType('Object_name',session_id,sf_instance)

Using SalesforceBulk api to perform insert, delete or update operation in salesforce Org and it has the operation of using Parallel computation also.

#using Bulk Api to insert the data
bulk = SalesforceBulk(sessionId=session_id, host=urlparse('http://'+sf_instance).hostname)
job = bulk.create_insert_job("Object_name", contentType='CSV',concurrency='Parallel')
# this is to specify no of records u want to enter in each iteration
# which is totally depends on the size of the data.
for i in range(0,len(df_em),10000):  
    df=[]                        
    df = df_em[i:i+10000]
    res1=df.to_dict(orient='records')
    csv_iter = CsvDictsAdapter(iter(res1))
    batch = bulk.post_batch(job, csv_iter)
    result=bulk.wait_for_batch(job, batch)
bulk.close_job(job) 
#Most important step to close the job