Reputation
Badges 1
371 × Eureka!I normally just upload the data to the ClearML server and then remove it locally from my machine but I understand that isn't what you want. A quick hack was the only thing I could come up with at the moment xd. Anyway you're welcome. Hope you find a solution.
I'm not using decorators. I have a bunch of function_steps followed by a normal task step, where I've passed a base_task_id.
I want to check the value of one of the functional steps, and if it holds true, I want to execute the task step otherwise I want the pipeline to end there, since the task step is the last one.
Set up is on a single machine, I have a nas mounted where I'm watching a folder, if there are sufficient images, it should publish the data but since I was using start_remotely, the code was running somewhere else and couldn't access folder.
i think it downloads from the curl command
It'll be labeled in the folder I'm watching it.
Here's the screenshot TimelyPenguin76
I just want to be able pass output from some step as input to some other step.
AgitatedDove14 Can you help me with this? Maybe something like storing the returned values or something in a variable outside the pipeline?
My use case is that the code using pytorch saves additional info like the state dict when saving the model. I'd like to save that information as an artifact as well so that I can load it later.
I just shared manually the logs because it had email and other details mentioned in the complete logs. If it helps, I'll share the logs as soon as I can.
If it helps, I can try and record my steps in a video.
AnxiousSeal95 Basically its a function step return. if I do, artifacts.keys(), there are no keys, even though the step prior to it does return the output
I think maybe it does this because of cache or something. Maybe it keeps a record of an older login and when you restart the server, it keeps trying to use the older details maybe
Also, is clearml open source and accepting contributions or is it just a limited team working on it? Sorry for an off topic question.
This works, thanks. Do you have any link to where I can also see the parameters of the Dataset class or was it just on git?
This here shows my situation. You can see the code on the left and the tasks called 'Cassava Training' on the right. They keep getting enqueued even though I only sent a trigger once. By that I mean I only published a dataset once.
This is the original repo which I've slightly modified.
When I try to access the server with the IP I set as CLEARML_HOST_IP, it looks like this. I set that IP to the ip assigned to me by the network
I'm both printing it and writing it to a file
Well I'm still researching how it'll work. I'm expecting it to not be very good and will make the model learning very stochastic in nature.
I plan to instead at the training stage, instead of just getting this model, use Dataset.squash, to get previous M datasets merged together.
This should introduce stability in the dataset.
Also this way, our model is trained on a batch of data multiple times but only for a few times before that batch is discarded. We keep the training data fresh for co...
I ran a training code from a github repo. It saves checkpoints every 2000 iterations. Only problem is I'm training it for 3200 epochs and there's more than 37000 iterations in each epoch. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Considering I don't think the function itself requires Venv to run normally but in this case it says it can't find venv
I would normally like for it to install any requirements needed on its own.
Can you spot something here? Because to me it still looks like it should only create a new Dataset object if batch size requirement is fulfilled, after which it creates and publishes the dataset and empties the directory.
Once the data is published, a dataset trigger is activated in the checkbox_.... file. which creates a clearml-task for training the model.
I don't think I changed anything.
I get what you're saying. I was considering training on just the new data to see how it works. To me it felt like that was the fastest way to deal with data drift. I understand that it may introduce instability however. I was curious how other developers who have successfully managed to set up continuous training deal with it. 100% new data, or a ratio between new and old data. And if it is the latter, what should be the case, which should be the majority, old data or new data?
Thus I wanted to pass the model id from the prior step to the next one.