Is there a way to store the return values after each pipeline stage in a format other than pickle?
stuff is a package that has my local modules - I've added it to my path by sys.path.insert, though here it isn't able to unpickle
btw here is the content of the imported file:
import
torch
from
torchvision
import
datasets, transforms
import
os
MY_GLOBAL_VAR = 32
def my_dataloder
():
return
torch.utils.data.DataLoader(
datasets.MNIST(os.path.join('./', 'data'), train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor()
])),
batch_size=32, shuffle=True)
How do we close pipelinedecorators?
It is showing running even after pipeline was completed
No, it is supposed to have its status updated automatically. We may have a bug. Can you share some example code with me, so that i could try to figure out what is happening here ?
can you share with me an example or part from your code ? I might miss something in wht you intend to achieve
I tried it - it works for a library that you can install, not for something local I suppose
Umm I suppose that won't work - this package consists of .py scripts that I use for a set of configs and Utils for my model.
Hey We figured a temporary solution - by importing the modules and reloading the contents of the artefact by pickle. It still gives us a warning, though training works now. Do send an update if you find a better solution
How would you structure PyTorch pipelines in clearml? Especially dealing with image data
However, I use this to create an instance of a dataloader(torch) this is fed into my next stage in the pipeline - though I import the local modules and add the folders to the path it is unable to unpickle the artifact
TenderCoyote78
the status should normally be automatically updated . Do all the steps finish successfully ? And also the pipeline ?
Yep, the pipeline finishes but the status is still at running . Do we need to close a logger that we use for scalers or anything?
you can also specify a package, with or without specifying its version
https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements
I'm facing the same issue, is there any solution to this?
Here's the code, we're trying to make a pipeline using PyTorch so the first step has the dataset that ’ s created using ‘stuff’ - a local folder that serves as a package for my code. The issue seems to be in the unpicking stage in the train function.
have you tried to add the requirements using Task.add_requirements( local_packages ) in your main file ?
i managed to import a custom package using the same way you did : i have added the current dir path to my system
i have a 2 steps pipeline :
- Run a function from a custom package. This function returns a Dataloader (built from torchvision.MNIST) 2) This step receives the dataloader built in the first step as a parameter ; it shows random samples from itthere has been no error to return the dataloader at the end of step1 and to import it at step2. Here is my code :
` from clearml import PipelineDecorator, Task
@PipelineDecorator.component(return_values=['dl'], cache=True,
repo='/home/xxxxxxxx/ClearML/Slack',
packages=['clearml==1.4.1'])
def step_one_IMPORT():
import sys
sys.path.insert(0,'/home/xxxxxxx/ClearML/Slack')
import omamitesh_import
print('==> STEP1: Import the custom import file')
print(f'Imported variable: {omamitesh_import.MY_GLOBAL_VAR}')
dl = omamitesh_import.my_dataloder()
print('Dataloader imported with success')
return dl
@PipelineDecorator.component(return_values=[], cache=True, parents=['step_one_IMPORT'])
def step_two_TEST_IMPORT(dl):
import numpy as np
import PIL.Image as pil
print(f'==> STEP2: Showing DL samples ({dl})')
for (i, sample) in enumerate(dl):
r = np.random.randint(32)
img = sample[0][r].view(28, 28).numpy()
img = pil.fromarray((img * 255).astype(np.uint8))
img.show()
if i > 4:
break
@PipelineDecorator.pipeline(name='220620', project='Issues Repro Pipeline', version='0.0.1',
default_queue='queue-1', pipeline_execution_queue='queue-2')
def pipeline():
# building the pipeline
dl = step_one_IMPORT()
step_two_TEST_IMPORT(dl)
if name == "main":
project_name = 'Issues Repro'
task_name = '220620'
task = Task.init(project_name=project_name, task_name=task_name)
PipelineDecorator.run_locally()
pipeline()
print('pipeline completed') `
Hey so I was able to get the local .py files imported by adding the folder to my path sys .path
Though as per your docs the add_requirements is for a requirements .txt