Hey We figured a temporary solution - by importing the modules and reloading the contents of the artefact by pickle. It still gives us a warning, though training works now. Do send an update if you find a better solution
can you share with me an example or part from your code ? I might miss something in wht you intend to achieve
btw here is the content of the imported file:
import
torch
from
torchvision
import
datasets, transforms
import
os
MY_GLOBAL_VAR = 32
def my_dataloder
():
return
torch.utils.data.DataLoader(
datasets.MNIST(os.path.join('./', 'data'), train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor()
])),
batch_size=32, shuffle=True)
stuff is a package that has my local modules - I've added it to my path by sys.path.insert, though here it isn't able to unpickle
Here's the code, we're trying to make a pipeline using PyTorch so the first step has the dataset that ’ s created using ‘stuff’ - a local folder that serves as a package for my code. The issue seems to be in the unpicking stage in the train function.
i managed to import a custom package using the same way you did : i have added the current dir path to my system
i have a 2 steps pipeline :
- Run a function from a custom package. This function returns a Dataloader (built from torchvision.MNIST) 2) This step receives the dataloader built in the first step as a parameter ; it shows random samples from itthere has been no error to return the dataloader at the end of step1 and to import it at step2. Here is my code :
` from clearml import PipelineDecorator, Task
@PipelineDecorator.component(return_values=['dl'], cache=True,
repo='/home/xxxxxxxx/ClearML/Slack',
packages=['clearml==1.4.1'])
def step_one_IMPORT():
import sys
sys.path.insert(0,'/home/xxxxxxx/ClearML/Slack')
import omamitesh_import
print('==> STEP1: Import the custom import file')
print(f'Imported variable: {omamitesh_import.MY_GLOBAL_VAR}')
dl = omamitesh_import.my_dataloder()
print('Dataloader imported with success')
return dl
@PipelineDecorator.component(return_values=[], cache=True, parents=['step_one_IMPORT'])
def step_two_TEST_IMPORT(dl):
import numpy as np
import PIL.Image as pil
print(f'==> STEP2: Showing DL samples ({dl})')
for (i, sample) in enumerate(dl):
r = np.random.randint(32)
img = sample[0][r].view(28, 28).numpy()
img = pil.fromarray((img * 255).astype(np.uint8))
img.show()
if i > 4:
break
@PipelineDecorator.pipeline(name='220620', project='Issues Repro Pipeline', version='0.0.1',
default_queue='queue-1', pipeline_execution_queue='queue-2')
def pipeline():
# building the pipeline
dl = step_one_IMPORT()
step_two_TEST_IMPORT(dl)
if name == "main":
project_name = 'Issues Repro'
task_name = '220620'
task = Task.init(project_name=project_name, task_name=task_name)
PipelineDecorator.run_locally()
pipeline()
print('pipeline completed') `
Umm I suppose that won't work - this package consists of .py scripts that I use for a set of configs and Utils for my model.
I tried it - it works for a library that you can install, not for something local I suppose
However, I use this to create an instance of a dataloader(torch) this is fed into my next stage in the pipeline - though I import the local modules and add the folders to the path it is unable to unpickle the artifact
I'm facing the same issue, is there any solution to this?
How do we close pipelinedecorators?
It is showing running even after pipeline was completed
Hey so I was able to get the local .py files imported by adding the folder to my path sys .path
you can also specify a package, with or without specifying its version
https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements
Yep, the pipeline finishes but the status is still at running . Do we need to close a logger that we use for scalers or anything?
Though as per your docs the add_requirements is for a requirements .txt
No, it is supposed to have its status updated automatically. We may have a bug. Can you share some example code with me, so that i could try to figure out what is happening here ?
TenderCoyote78
the status should normally be automatically updated . Do all the steps finish successfully ? And also the pipeline ?
Is there a way to store the return values after each pipeline stage in a format other than pickle?
hey WickedElephant66 TenderCoyote78
I'm working on a solution, just hold on, I update you asap
How would you structure PyTorch pipelines in clearml? Especially dealing with image data
have you tried to add the requirements using Task.add_requirements( local_packages ) in your main file ?